Picture for Yonathan Efroni

Yonathan Efroni

Bill

Credit Assignment with Resets in Language Model Reasoning

Add code
May 26, 2026
Viaarxiv icon

Hack-Verifiable Environments: Towards Evaluating Reward Hacking at Scale

Add code
May 20, 2026
Viaarxiv icon

Structure Enables Effective Self-Localization of Errors in LLMs

Add code
Feb 02, 2026
Viaarxiv icon

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs

Add code
Oct 22, 2025
Viaarxiv icon

Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment

Add code
Sep 18, 2025
Figure 1 for Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment
Figure 2 for Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment
Figure 3 for Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment
Figure 4 for Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment
Viaarxiv icon

Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do

Add code
Mar 20, 2025
Figure 1 for Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Figure 2 for Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Figure 3 for Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Figure 4 for Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
Viaarxiv icon

Aligned Multi Objective Optimization

Add code
Feb 19, 2025
Viaarxiv icon

Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank

Add code
Oct 01, 2024
Figure 1 for Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Figure 2 for Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Figure 3 for Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Figure 4 for Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Viaarxiv icon

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Add code
Jun 03, 2024
Figure 1 for RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
Figure 2 for RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
Viaarxiv icon

Generalizing Multi-Step Inverse Models for Representation Learning to Finite-Memory POMDPs

Add code
Apr 22, 2024
Viaarxiv icon